Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables

نویسندگان

  • László Antal
  • Natalie Shlomo
  • Mark Elliot
چکیده

Frequency tables disseminated by statistical agencies have always been of high interest. However, the agencies have to ensure that the risk of identifying individuals and disclosing individuals’ attributes from the released data is low. Therefore they assess the risk of disclosure and apply statistical disclosure control (SDC) methods if necessary. The main objective of this work is to measure dislosure risk in population based frequency tables. The disclosure risk assessment of such tables is often based on the so-called threshold rule. A cell of the table is of high disclosure risk according to this rule if the cell value does not exceed a certain threshold, for example 2. In this work we propose to measure the disclosure risk in an alternative way. Our approach takes the entire table (and rows/columns of the table) into consideration. We introduce a disclosure risk measure, which is based on information theoretical definitions, such as the entropy and the conditional entropy. There are two main types of SDC methods. Pre-tabular methods, such as record swapping, alter the values of a variable (or more variables) for selected individuals

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

WP. 33 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT)

In order to manage the disclosure risk in frequency tables containing population counts, the tables undergo statistical disclosure control (SDC) methods. This results in information loss. We examine quantitative information loss measures for frequency tables and compare them across different SDC methods. We show examples of the information loss measures on real UK 2001 Census tables after they ...

متن کامل

Disclosure Risk Measurement with Entropy in Two-Dimensional Sample Based Frequency Tables

We extend a disclosure risk measure defined for population based frequency tables to sample based frequency tables. The disclosure risk measure is based on information theoretical expressions, such as entropy and conditional entropy, that reflect the properties of attribute disclosure. To estimate the disclosure risk of a sample based frequency table we need to take into account the underlying ...

متن کامل

Statistical Disclosure Control Methods for Census Frequency Tables

This paper provides a review of common statistical disclosure control (SDC) methods implemented at Statistical Agencies for standard tabular outputs containing whole population counts from a Census (either enumerated or based on a register). These methods include record swapping on the microdata prior to its tabulation and rounding of entries in the tables after they are produced. The approach ...

متن کامل

A posteriori Disclosure Risk Measure for Tabular Data Based on Conditional Entropy∗

Statistical database protection, also known as Statistical Disclosure Control (SDC), is a part of information security which tries to prevent published statistical information (tables, individual records) from disclosing the contribution of specific respondents. This paper deals with the assessment of the disclosure risk associated to the release of tabular data. So-called sensitivity rules are...

متن کامل

A Measure of Disclosure Risk for Tables of Counts

The paper describes a new method for assessing disclosure risk for tables of counts; the subtraction-attribution probability (SAP) method. The SAP score is the probability of an intruder recovering a 'risky' subpopulation table given a quantity of information about the individuals in a population table. The method can be applied to exact or perturbed individual tables and sets of tables. The me...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014